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Abstract.  We  describe  new  techniques  for  model-based  recognition  of  3-D  objects  from 
unknown  viewpoints  using  single  gray  scale  images.  The  objects  in  the  scene  may  be  over- 
lapping and  partially  occluded.  Efficient  matching  algorithms,  which  assume  affine  approxi- 
mation to  the  perspective  viewing  transformation,  are  proposed.  The  presentation  is 
currently  restricted  to  flat  rigid  3-D  objects.  Point,  line  and  curve  matching  algorithms  are 
presented.  The  paper  especially  emphasizes  the  curve  matching  problem.  Experimental 
results  are  included. 

1.   Introduction. 

Object  recognition  is  a  major  task  in  computer  vision.  Most  practical  recogni- 
tion systems  are  model-based  systems,  where  the  task  is  to  match  known  models 
against  an  image  of  a  scene  (see  a  survey  in  [1]).  In  this  paper  we  address  the 
recognition  problem  of  overlapping  and  partially  occluded  objects  in  composite 
scenes.  The  3-D  positions  of  the  objects  are  arbitrary  and  no  restriction  on  the 
viewing  angle  of  the  camera  is  assumed.  We  concentrate  on  recognition  of  flat  rigid 
objects.  However,  our  method  can  be  extended  to  general  objects  and  part  of  our 
future  experiments  will  concentrate  in  this  direction.  The  recognition  is  done  from 
2-D  intensity  images.  We  assume  the  affine  approximation  to  the  viewing  transfor- 
mation (see  discussion  in  Section  2). 

Since  we  are  concerned  with  recognition  of  partially  occluded  objects,  no  use  of 
global  features  can  be  made.  We  represent  our  objects  by  sets  of  local  features. 
These  features  can  be  points,  line  segments,  curve  segments,  etc.  In  this  paper  we 
shortly  review  our  previous  results  on  object  recognition  by  affine  invariant  point 
matching  ([2])  and  concentrate  on  new  techniques  based  on  affine  invariant  curve 
matching. 

Much  work  was  done  on  recognition  of  overlapping  planar  objects  in  2-D 
scenes  (see,  for  example,  [3-7]).  Additional  examples  can  be  found  in  [1].  A 
comprehensive  survey  of  existing  3-D  object  recognition  systems  is  given  in   [1,8]. 

Recent  results  which  are  not  listed  in  the  above  mentioned  surveys  and  are 
relevant  to  this  paper  are  [9-12].  While  [9-11]  concentrate  on  recognition  utihzing 
points  and  lines,  the  approach  of  [12]  is  based  on  affine  invariant  boundary  curve 
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matching. 

The  SCERPO  system  of  Lowe  ([9,13])  recognizes  3-D  objects  from  2-D  inten- 
sity scenes.  It  is  viewpoint  invariant,  and  is  based  on  perceptual  grouping  of  object 
features.  At  its  present  stage,  however,  the  SCERPO  system  is  mainly  suited  for 
polyhedral  scenes.  Thompson  and  Mundy  ([10])  assume  the  affine  approximation  to 
the  perspective  transformation,  and  use  a  clustering  approach  to  recover  the 
transformation  between  groups  of  model  vertices  and  edges  and  the  appropriate 
groups  in  the  image.  Our  method  is  computationally  more  efficient  and  does  not 
require  special  edge  groupings.  Huttenlocher  and  Ullman  ([11])  also  use  the  affine 
approximation  to  the  perspective  transformation.  They  recognize  flat  objects  by 
alignment  of  point  triplets.  Since  such  an  approach  may  result  in  a  high  complexity 
algorithm,  they  emphasize  the  classification  of  the  model  and  image  points  to  reduce 
the  complexity  of  matching,  while  the  matching  algorithm  itself  is  straightforward. 
Our  approach  in  the  point  matching  algorithm  (see  Section  4),  is  somewhat  comple- 
mentary to  that  of  [11].  We  consider  the  case  were  no  effective  point  classification 
can  be  done  (this  is  also  the  assumption  in  [10]),  hence  the  matching  algorithm  itself 
has  to  be  efficient.  The  recognition  applies  geometric  constraints  to  develop  an  effi- 
cient feature  matching  algorithm.  The  algorithm  processes  the  models  and  the  scene 
images  independently  allowing  fast  recognition.  In  case  feature  classification  is  pos- 
sible it  can  be  incorporated  into  our  algorithm  in  a  natural  way  to  improve  its  effi- 
ciency. For  example,  our  non-convex  curve  matching  algorithm  (see  Section  6.1) 
uses  triplets  of  points  based  on  stable  affine  invariant  features  (the,  so  called,  con- 
cavity entrancies) .  Since  these  features  are  distinct,  their  use  significantly  reduces 
the  complexity  of  the  matching  algorithm. 

Cyganski  and  Orr  ([12,14])  develop  an  affine  invariant  curve  matching  algo- 
rithm for  object  recognition.  Their  methods  use  global  region  information,  and  are 
unapplicable  in  recognition  of  occluded  objects.  A  recent  affine  invariant  curve 
matching  method  by  Hong  and  Tan  ([15])  is  also  applicable  only  to  unoccluded 
curves.  To  overcome  this  problem,  we  propose  affine  invariant  curve  matching 
techniques,  which  are  based  on  local  features,  hence  can  deal  with  occlusion. 

This  paper  reports  preliminary  results  in  a  series  of  experiments  to  develop  a 
3-D  object  recognition  system  from  2-D  images,  based  on  efficient  point,  line,  and 
curve  matching  procedures.  The  paper  is  organized  as  follows.  Section  2  states  the 
recognition  problem.  Section  3  states  some  well  known  facts  about  affine  invariant 
planar  point  set  representation.  Sections  4  and  5  review  our  point  and  line  matching 
algorithms  (for  more  details  see  [2]).  Section  6  discusses  affine  invariant  curve 
matching,  both  in  the  non-convex  and  convex  case.  This  section  is  the  main  novel 
contribution  of  the  paper.  Section  7  states  the  results  of  our  least  squares  point 
matching  algorithm  (see  [2]).  Finally,  section  8  describes  experimental  results,  and 
section  9  sets  some  future  goals.  The  algorithms  which  we  describe  were  actually 
tested  in  "real  life  situations"  by  recognition  of  objects  comprising  composite  scenes 
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(see  Fig.'s  1.  4,  10,  15,  17). 


Figure  1  :  2  pliers  and  their  composite  scene.  (Observe  different  lengths  of  handles  in 
the  composite  scene  due  to  tilt.) 


2.   Definition  of  the  Problem 

We  consider  the  problem  of  model  based  object  recognition  in  a  cluttered  3-D 
scene.  The  objects  in  the  scene  may  overlap  and  also  be  partially  occluded  by  other 
(unknown)  objects  (see,  for  example,  Fig.'s  Ic,  4c,  10c).  The  objects  are  assumed 
to  be  almost  flat  (or  having  flat  surfaces),  however  their  position  in  space  might  be 
arbitrary.  We  also  assume  that  the  depth  of  the  centroids  of  the  objects  in  the  scene 
is  large  compared  to  the  focal  length  of  the  camera,  and  that  the  depth  variation  of 
the  objects  is  small  compared  to  the  depth  of  their  centroids.  Under  these  assump- 
tions it  is  well  known  that  the  perspective  projection  is  well  approximated  by  a 
parallel  (orthographic)  projection  with  a  scale  factor  (see  for  example  p. 79  in  [16]  , 
or  [10, 17]).  Hence,  two  different  images  of  the  same  flat  object  are  in  an  affine  2- 
D  correspondence,  namely  there  is  a  non  singular  2x2  matrix  A  and  a  2-D  (transla- 
tion)   vector   b   ,   such   that   each   point   x   in   the   first   image   is   translated  to  the 
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corresponding  point  Ax  +  b  in  the  second  image. 

Our  problem  is  to  find  the  identity  of  the  objects  in  the  scene  and  the  affine 
transformation  between  their  location  in  the  scene  and  their  stored  models. 

3.   Affine  Invariant  Representation  of  Planar  Point  Sets 

It  is  well  known  that  an  affine  transformation  of  the  plane  is  uniquely  defined 
by  the  transformation  of  three  non-coUinear  points  (see,  for  example,  [18]).  More- 
over, there  is  a  unique  affine  transformation,  which  maps  any  non-collinear  triplet 
in  the  plane  to  another  non-collinear  triplet.  Suppose  we  are  given  a  set  of  points  in 
the  plane  together  with  three  non-collinear  points,  which  we  will  denote  eoo,  eio, 
eoi  .  Using  this  triplet  of  points  as  an  affine  basis  in  the  plane  one  can  express  the 
affine  coordinates  (^,ti)  of  an  arbitrary  point  v  as 

V  =  ?(eio  -  eoo)  +  "nCcoi  -  eoo)  +  eoo 

Application  of  an  affine  transformation  T  will  transform  the  point  v  to 

Tv  =  KTeio  -  Teoo)  +  Ti(Teoi  -  Teoo)  +  Teoo 

Hence,    Tv    has  the  same  coordinates  (4,ti)  in  the  basis  triplet    Teoo,  Teio,  Teoi 
(the  transformed  set  of  points  remains  non-collinear). 

Accordingly,  object  points  will  have  the  same  coordinates  relative  to  the  same 
(transformed)  triplet  of  non-collinear  points,  regardless  of  the  affine  transformation 
which  was  applied  to  the  point  set.  One  can  define  a  normalized  affine  object  by 
transforming  the  basis  triplet  eoo.  eio.  eoi  to  (0,0),  (1,0),  (0,1)  respectively.  In 
this  normalized  version  similar  point  sets  are  represented  by  similar  coordinate  sets. 
Moreover,  this  representation  allows  comparison  of  occluded  objects,  since  the 
coordinates  of  an  occluded  object  will  have  a  partial  overlap  with  the  coordinates  of 
the  stored  model. 

The  above  mentioned  representation  is,  however,  dependent  on  the  specific 
point  triplet  which  was  chosen  as  an  affine  basis.  One  way  to  make  it  a  plausible 
representation  is  to  choose  as  its  basis  (or  bases)  points  belonging  to  an  affine 
invariant  stable,  easily  identifiable  feature  (features).  Our  non-convex  curve  match- 
ing procedure  which  is  described  in  Section  6.1  is  based  on  such  an  approach.  This 
approach  has  the  advantage  of  reduced  complexity.  However,  it  may  preclude 
recognition  of  occluded  objects,  if  the  required  features  do  not  appear  in  the  scene. 
Another  approach  is  to  represent  the  object  points  in  all  possible  affine  bases.  This 
is  the  approach  taken  by  us  for  recognition  of  objects  represented  by  sets  of  interest 
points  in  [2],  which  we  shortly  review  in  Section  4,  and  in  matching  of  convex 
curves,  which  is  described  in  Section  6.2. 
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4.   Recognition  by  Point  Matching 

In  the  situation  described  here  each  model  object  is  represented  by  a  set  of 
interest  points.  This  points  should  be  invariant  under  affine  transformations.  In  our 
experiments  we  used  points  of  sharp  convexities  and  deep  concavities  (see  Fig.  2). 
These  are  either  points  where  the  tangent  does  not  exist  or  points  with  high  curva- 
ture. The  first  type  of  points  is  preserved  under  affine  transformation.  The  second 
type  is  theoretically  not  necessarily  preserved  under  an  affine  transformation.  How- 
ever, they  will  be  preserved  in  most  practical  cases  when  the  change  in  the  viewing 
angle  is  not  dramatically  different.  In  case  a  certain  sharp  convexity  is  not 
preserved  it  does  not  affect  our  algorithm,  since  it  can  deal  with  missing  points.  A 
similar  set  of  interest  points  is  extracted  in  the  composite  scene  where  the  objects 
have  to  be  recognized. 


Fignre  2  :  Extracted  interest  points  in  the  images  of  Fig.  1. 


As  was  mentioned  before,  an  affine  transformation  is  uniquely  defined  by  the 
transformation  of  three  non-collinear  points  in  the  plane.  Hence,  we  can  try  to 
match  non-collinear  triplets  of  model  points  against  non-collinear  scene  triplets  to 
obtain  candidate  affine  transformations.    Each  such  transformation  can  be  checked 
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by  matching  the  transformed  model  against  the  scene.  This  is  also  the  basic 
approach  in  [11]. 

However,  the  complexity  of  such  a  scheme  is  quite  unfavorable.  Given  m 
points  in  the  model  and  n  points  in  the  scene,  the  worst  case  complexity  is 
(mXn)^  xt,  where  t  is  the  complexity  of  verifying  the  model  against  the  scene.  If 
we  assume  that  m  and  n  are  of  the  same  magnitude,  and  t  is  at  least  of  magnitude  m, 
the  worst  case  complexity  is  of  order  n'' .  One  way  to  reduce  this  complexity  ([11]) 
is  to  classify  the  points  in  a  distinctive  way,  so  that  each  triplet  can  match  only  a 
small  number  of  other  triplets.  We  consider,  however,  the  situation  were  such  a 
distinction  does  not  exist  or  is  not  enough  for  significant  reduction  of  complexity 
(see  the  discussion  in  [10]).  Hence,  we  present  a  more  efficient  triplet  matching 
algorithm. 

The  algorithm  consists  of  two  major  steps.  The  first  one  is  an  affine  invariant 
model  representation  step.  This  step  does  not  use  any  information  about  the  scene 
and  is  executed  off-line  before  actual  matching  is  attempted.  The  second  step, 
matching  proper,  uses  the  data  prepared  by  the  first  step  to  match  the  models 
against  the  scene.  The  execution  time  of  this  second  step  is  the  actual  recognition 
time. 

As  was  mentioned  in  Section  3,  given  a  basis  triplet  each  point  can  be 
represented  as  a  coordinate  pair  in  this  basis.  Since  we  have  no  preference  for  any 
specific  basis  triplet  (and  we  allow  occlusion),  the  set  of  model  points  will  be 
represented  in  all  possible  non-collinear  triplet  bases.  Our  task  will  be  to  recognize 
a  model  in  the  scene,  represented  in  one  of  the  possible  basis  triplets. 

A)  Model  Representation  and  Hash-table  Formation 

Assume  we  are  given  an  image  of  a  model,  where  m  interest  points  have  been 
extracted.  For  each  ordered  non-collinear  triplet  of  model  points  the  coordi- 
nates of  all  other  m— 3  model  points  are  computed  taking  this  triplet  as  an 
affine  basis  of  the  2-D  plane.  Each  such  coordinate  (after  a  proper  quantiza- 
tion) is  used  as  an  entry  to  a  hash-table,  where  we  record  the  basis-triplet  at 
which  the  coordinate  was  obtained  and  the  appropriate  model.  The  complexity 
of  this  preprocessing  step  is  of  order  m^  per  model.  New  models  added  to  the 
data-base  can  be  processed  independently  without  recomputing  the  hash-table. 

B)  Recognition 

In  the  recognition  stage  we  are  given  an  image  of  a  scene,  where  n  interest 
points  have  been  extracted.  We  choose  an  arbitrary  ordered  triplet  in  the  scene 
and  compute  the  coordinates  of  the  scene  points  taking  this  triplet  as  an  affine 
basis.  For  each  such  coordinate  we  check  the  appropriate  entry  in  the  hash- 
table,  and  for  every  pair  (model,  basis-triplet),  which  appears  there,  we  tally  a 
vote  for  the  model  and  the  basis-triplet  as  corresponding  to  the  triplet  which 
was  chosen  in  the  scene.    (If  there  is  only  one  model,  we  have  to  vote  for  the 
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basis  triplet  alone). 

If  a  certain  pair  (model,  basis-triplet)  scores  a  large  number  of  votes,  we  decide 
that  this  triplet  corresponds  to  the  one  chosen  in  the  scene.  The  uniquely 
defined  affine  transformation  between  these  triplets  is  assumed  to  be  the 
transformation  between  the  model  and  the  scene.  The  model  edges  are  then 
verified  against  the  scene.  If  the  current  triplet  does  not  score  high  enough,  we 
pass  to  another  basis-triplet  in  the  scene. 

For  the  algorithm  to  be  successful  it  is  enough,  theoretically,  to  pick  three  non- 
collinear  points  in  the  scene,  belonging  to  some  model.  The  voting  process,  per  tri- 
plet, is  linear  in  the  number  of  points  in  the  scene.  Hence,  the  overall  recognition 
time  is  dependent  on  the  'density'  of  model  points  in  the  scene.  Although,  in  the 
worst  case,  we  might  have  an  order  of  n'^  operations,  in  most  cases,  especially  when 
the  number  of  models  is  small,  the  algorithm  will  be  much  faster.  For  example,  if 
we  want  to  recognize  a  model,  having  k  model  points  in  a  scene  of  n  points,  then  the 
probability  of  not  choosing  a  scene  triplet  belonging  entirely  to  this  model  in  t  trials 
is  approximately 

;'  =  (l-(-)^)' 
n 

Ic 
Assume  a  lower  bound   d    on  the  'density'     —    of  the  points  belonging  to  a  model 

n 

in  the  scene  (this  simply  means  that  the  scene  has  'enough'  information  to  enable 
recognition  of  the  models).   Then,  for  a  given    €>0  ,  the  number  of  trials    r    giving 

r><€    is  of  order — z—  ,  which  is  a  constant  independent  of  n.    Since  the 

log(l-j3) 

verification  process  is  linear  in  n,  we  have,  in  this  case,  an  algorithm  of  complexity 
0(n),  which  will  succeed  with  probability  of  at  least    1  — e. 


Figure  3  :  Recognition  result  of  the  scene  in  Fig.  Ic. 
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This  method  assumes  no  a-priori  classification  of  the  model  and  scene  points  to 
achieve  matching  candidates.  If  such  information  is  available,  it  can  be  incorporated 
into  our  method  by  assigning  weights  to  the  correspondence  of  different  triplets  to 
the  model,  and  by  checking  the  triplets  in  an  appropriate  order.  For  example,  our 
non-convex  curve  matching  algorithm  (see  Section  6.1)  uses  triplets  of  points  based 
on  stable  affine  invariant  features  (the,  so  called,  concavity  entrancies) .  In  this  case 
the  matching  algorithm  is  linear  in  the  number  of  the  sample  points  on  a  curve. 

A  major  potential  advantage  of  the  suggested  algorithm  is  its  high  inherent 
parallelism.  Parallel  implementation  of  this  algorithm  is  straightforward;  moreover, 
it  should  be  quite  easy  to  build  a  special  device  for  this,  implementing  it  at  very  high 
speed. 

For  further  discussion  of  reduction  of  complexity  incorporating  various  affine 
invariants  see  [2]. 

5.  Line  Matching 

In  the  previous  sections  we  dealt  with  point  matching  algorithms.  However, 
extraction  of  points  might  be  somewhat  noisy.  A  line  is  usually  a  more  stable 
feature  than  a  point.  Thus  in  scenes  were  lines  can  be  extracted  in  a  reliable  way, 
e.g.  scenes  of  polyhedral  objects,  we  might  be  interested  to  apply  similar  procedures 
to  lines. 

All  the  point  matching  techniques  of  Section  4  apply  directly  to  lines,  since 
lines  can  be  viewed  as  points  in  the  dual  space.  Thus  three  lines  which  have  no 
parallel  pair  are  a  basis  of  the  affine  space,  each  line  has  unique 

coordinates  in  this  basis,  and  we  repeat  exactly  the  matching  procedure  of  Section  4. 

We  can  also  make  use  of  line  segments  to  reduce  the  complexity  of  the  match- 
ing algorithm  of  Section  4.  If  the  endpoints  of  line  segments  can  be  reliably 
extracted,  then  instead  of  a  triplet  of  points  or  lines  as  a  basis,  we  can  take  a  line 
segment  plus  an  additional  point.   This  reduces  the  complexity  by  a  factor  of  «. 

Since  an  affine  transformation  maps  collinear  points  into  collinear  points  and 
points  of  line  intersection  into  points  of  the  same  line  intersection,  we  may  develop 
algorithms  which  combine  point  and  line  information.  For  example,  even  if  the 
algorithm  utilizes  point  triplets  as  an  affine  basis,  the  recognition  can  be  done  not 
only  on  other  interest  point  coordinates,  but  also  on  line  equations,  etc. 

6.  Curve  Matching 

Since  the  shape  of  planar  rigid  bodies  is  completely  described  by  their  boun- 
dary curves,  object  recognition  can  be  accomplished  by  matching  these  curves.  For 
example,  the  objects  in  Fig.  4  can  be  recognized  by  matching  the  boundary  curve  of 
the  composite  scene  in  Fig.  4c  against  the  boundary  curves  of  the  objects  in  Fig.  4a 


and  Fig.  4b. 


Figure  4  :  A  piza-cutter,  a  spatula  and  their  composite  scene. 

Matching  of  curves  which  have  undergone  affine  transformation  was  discussed 
in  the  works  of  Cyganski  and  Orr  ([12,14]).  Their  methods,  however,  require 
knowledge  of  the  full  curve,  hence  are  unable  to  deal  with  occlusion  such  as  in  Fig. 
4.  The  method  described  in  this  section  is  based  on  local  affine  invariant  features 
enabling  recognition  of  partially  occluded  objects. 

The  curves  are  represented  by  vertices  of  their  polygonal  approximations  (the 
details  of  the  approximation  procedure  are  outlined  in  Section  6.1.1).  We  discuss 
separately  the  cases  of  non-convex  and  convex  curves. 


6.1.   Non-convex  Curve  Matching 

It  is  quite  an  obvious  observation,  that  most  of  the  flat  objects  encountered  by 
us  do  not  have  a  convex  boundary.  Although  it  makes  them  geometrically  more 
complicated,  as  was  already  pointed  out  in  [19],  non-convex  planar  shapes  are  some- 
times easier  to  handle  than  convex  shapes.  In  the  case  of  an  affine  transformation 
each     concavity     supplies    us     with     a     stable     feature     from     which     the     affine 
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transformation  can  be  recovered.  Specifically,  consider  the  sketch  of  Fig.  5.  The 
concavity  depicted  there  is  bounded  by  a  single  segment  of  the  convex  hull  which  we 
call  (  following  [19])  the  concavity  entrance.  It  is  a  simple  geometric  observation 
that  the  concavity  entrance  is  invariant  under  affine  transformations.  This  follows 
immediately  from  the  following  affine  invariants: 

a)  a  curve  which  is  on  one  side  of  a  line  is  mapped  into  the  same  situation; 

b)  lines  are  mapped  to  lines; 

c)  coincidence  points  between  a  line  and  a  curve  are  mapped  to  such  coincidence 
points,  and  their  order  on  the  line  is  preserved. 

An  additional  point  which  is  invariant  under  affine  transformations  is  the  concavity 
point  most  distant  from  the  concavity  entrance  line  (if  this  point  is  not  unique 
choose  the  leftmost  point).  This  follows  from  the  fact  that  the  above  mentioned 
point  lies  on  the  most  distant  line  parallel  to  the  concavity  entrance,  which  still 
touches  the  boundary  curve  of  the  concavity.  Thus,  one  can  extract  a  concavity 
based  point  triplet  which  is  affinely  invariant.  This  basis  triplet  can  be  used  in  a 
recognition  scheme  (see  Section  6.1.1). 


I          /  entrance         \          / 

concavity 


b)     basis  triplet 


Figure  5. 


The  concavity  entrances  are  computed  as  follows.  First  the  convex  hull  of  a 
polygonal  approximation  of  the  boundary  curve  is  computed.  The  concavity 
entrance  endpoints  are  those  convex  hull  point  pairs  which  are  separated  by  polygon 
vertices  not  belonging  to  the  convex  hull.  The  computation  of  the  (leftmost)  boun- 
dary point  most  distant  from  the  concavity  entrance  is  obvious.  The  above  men- 
tioned procedure  is  of  complexity  0(n),  where  n  is  the  number  of  polygon  vertices 
(see  [20],  p. 93).  Since  this  algorithm  is  based  on  the  convex  hull  extraction  it  is 
fairly  robust.  Fig.  6  shows  the  concavity  entrances  of  the  objects  and  the  composite 
scene  of  Fig.  4. 
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Figure  6  :  The  concavity  entrances  of  the  objects  in  Fig.  4. 


6.1.1.   Concavity  Entrance  Based  Recognition 

The   previous   discussion    suggests   the    following   straightforward    scheme    for 
recognition  of  partially  occluded  non-convex  objects  in  composite  scenes: 

Preprocessing 

[a]  Extract  the  boundary  curves  of  the  model  objects.  Smooth  the  curves  (see 
[21])  by  inflating  them  to  a  narrow  strip  defined  by  a  certain  threshold  value  e, 
and  then  find  the  shortest  polygonal  path  lying  in  this  2€-wide  strip.  The  set  of 
vertices  of  this  polygonal  path  represent  the  model  object. 

[b]  Find  the  convex  hull  of  the  above  mentioned  vertex  set,  and  extract  the  concav- 
ities and  their  corresponding  basis  triplets.  (To  have  a  numerically  reliable 
basis,  we  have  to  demand  that  the  basis  points  should  not  be  'almost  collinear', 
namely,  the  ratio  between  the  height  of  the  third  point  above  the  concavity 
entrance  and  the  length  of  the  entrance  should  be  above  a  certain  threshold. 
Also  the  concavity  boundary  curve  should  not  be  too  short.) 
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Recognition 

[a]  +  [b] 

Given  the  boundary  curve  of  the  scene  apply  to  it  the  procedures  [a]  and  [b]  of 
the  preprocessing  stage. 

[c]  Choose  a  basis  triplet  corresponding  to  a  concavity  in  the  scene  and  a  basis  tri- 
plet corresponding  to  a  concavity  of  a  model.  Compute  the  affine  transforma- 
tion mapping  one  basis  into  another.  Verify  whether  a  large  set  of  the  model 
vertices  match  the  scene.   If  not  try  another  concavity  pair. 

Fig.  7  shows  the  recognition  results  for  the  scene  of  Fig.  4c. 


Figure  7:  Recognition  result  of  the  scene  in  Fig.  4c. 

The  worst  case  complexity  of  the  recognition  stage  is  OirnXk  Xn)  ,  for  m 
models,  O^k)  concavities  on  a  boundary  curve,  and  Oin)  vertices  on  such  a  curve. 
One  may  speed  up  the  above  mentioned  algorithm  by  comparing  only  'almost  simi- 
lar' concavities.  To  do  that  we  compute  a  concavity  shape  signature,  or  a,  so  called, 
footprint  (see  [6,22]  ). 


6.1.2.   Concavity  Footprints 

Given  different  concavities  one  would  like  to  find  an  affine  invariant  com- 
parison of  their  shapes.  In  order  to  do  that,  first  normalize  a  concavity  (see  Section 
3)  by  applying  the  transformation  which  maps  its  triplet  basis  to  a  standard  equila- 
teral triangle,  e.g.,  the  concavity  endpoints  are  mapped  to  (  —  1,0),  (1,0),  and  the 
third  point  to  (0,^3).  Now  all  the  concavities  are  expressed  in  an  affine  invariant 
way  and  their  'normal'  shapes  can  be  compared.  'To  each  such  'normal'  shape  we 
assign  a  number  or  a  vector  of  numbers  which  we  call  a  'footprint',  and  which 
represents  the  shape  of  the  'normalized'  concavity.  A  footprint  should  posses  the 
following  properties: 
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(i)     Continuity  -  close  shapes  should  have  close  footprints. 

(ii)    Separability  -  significantly  different  shapes  should  have  different  footprints. 

(ii)    Easy  computability. 

Since  the  shapes  are  already  in  a  normalized  form  it  is  easy  to  suggest  numerous 
footprint  schemes.    One  such  example  is  illustrated  in  Fig.  8.  For  some  constant  s 
(say  5^5^10  ),  divide  the  upper  half  plane  by  5  + 1  rays  from  the  origin  with  angle 

—  between  two  consecutive  rays.  Let  a,  be  the  area  of  the  'normalized'  shape 
s 

between     rays     /     and     i  +  l.      The     s     dimensional     shape     footprint     will     be 
(fli  ,a2  ,  ■  ■  ■  ,as  ). 


Figure  8:    An  example  of  a  shape  footprint. 

A  demonstration  of  some  of  the  normalized  footprints  of  the  shapes  of  Fig.  6  is 
given  in  Fig.  9.  These  footprints  are  8-dimensional.  Note  that  Fig.  9a  and  Fig.  9b 
represent  the  normalized  shape  of  the  same  concavity,  one  on  the  model  and  the 
second  in  the  composite  scene,  while  Fig.  9c  represents  a  different  normalized  con- 
cavity. It  is  easy  to  see  the  resemblance  of  the  first  two  shapes  compared  to  the 
third.  The  corresponding  numerical  values  of  the  footprints  are  as  follows:  the  foot- 
print of  Fig.  9a  is  (0.265,  0.212,  0.276,  0.683,  0.789,  0.388,  0.297,  0.298)  :  the 
footprint  of  Fig.  9b  is  (0.282,  0.230,  0.303,  0.704,  0.737,  0.344,  0.259,  0.290)  :  the 
footprint  of  Fig.  9c  is  (0.229,  0.293,  0.578,  0.912,  0.807,  0.341,  0.266,  0.293)  .  It 
is  obvious  that  different  and  more  accurate  footprint  schemes  can  be  designed  to  dif- 
ferentiate between  'normalized'  concavities.  At  this  stage  of  our  research  we  made 
no  attempt  to  design  an  'optimal'  footprint. 
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Figure  9:    a)  Footprint  of  concavity  i  in  Fig.  6a.    b)  Footprint  of  concavity  i  in  com- 
posite scene  of  Fig.  6c.    c)  Footprint  of  concavity  ii  in  composite  scene  of  Fig.  6c. 


The  recognition  algorithm  of  Section  6.1.1  can  now  be  improved  as  follows: 

Preprocessing 

[a]  +  [b] 

These  steps  are  the  same  as  in  6.1.1. 

[c]  For  each  concavity  of  the  model  object  compute  its  footprint.  Use  this  foot- 
print as  an  entry  to  a  hash-table  and  record  there  the  model  and  the  concavity 
for  which  this  footprint  was  obtained. 

Recognition 

[a]  +  [b] 

For  the  boundary  curve  of  the  scene  apply  [a]  and  [b]  as  before. 

[c]  Choose  a  concavity  in  the  scene,  and  compute  its  footprint.  For  this  footprint 
(properly  quantized)  check  the  appropriate  entry  (and  neighboring  close 
entries)  in  the  hash  table,  and  extract  the  pairs  (model,  concavity)  appearing 
there.    For  each  such  relevant  model  with  the  appropriate  concavity,  compute 
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the  corresponding  affine  transformation  and  execute  the  verification  procedure 
as  in  the  algorithm  of  the  previous  section. 

If  the  concavity  entrances  are  distinctive  enough,  the  worst  case  complexity  of  the 
recognition  stage  will  be  this  time  linearly  dependent  on  the  number  of  concavities 
in  the  scene  and  the  number  of  the  scene  vertex  points,  namely,  OC^Xn). 

6.2.    Convex  Curve  Matching 

In  case  one  is  confronted  with  convex  objects,  such  as  in  Fig.  10,  or  with  a 
non-convex  object  with  occluded  concavity  entrances,  the  previous  method  will  not 
work.  (It  is  interesting  to  point  out  that  in  this  case  the  point  classification  method 
of  [11]  is  also  unapplicable.)  In  this  subsection  we  show  how  our  framework  can  be 
expanded  in  a  straightforward  manner  to  allow  recognition  of  convex  objects. 
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Figure  10:     Two  convex  shapes  and  their  composite  scene. 


For  a  convex  object  we  have  no  natural  affine  bases  as  the  ones  defined  by  the 
concavities  of  non-convex  regions  (although  we  do,  usually,  have  such  natural  bases 
in  the  composite  scene,  as  will  be  pointed  out  later).    Hence  we  will  resolve  to  a 
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technique  similar  to  that  in  Section  4,  and  exploit  all  possible  bases.  In  Fig.  11  we 
depict  a  convex  curve.  Take  two  points  on  the  boundary,  joining  them  with  a  line, 
which  will  be  called  the  base  line.  It  defines  two  most  distant  curve  points  from  the 
base  line,  one  on  each  side  (if  this  point  on  one  side  is  not  defined  uniquely,  take 
the  leftmost  such  point).  Each  most  distant  point  together  with  the  endpoints  of  the 
base  line  form  an  affine  basis  triplet  (see  Fig.  11).  To  each  such  basis  triplet 
corresponds  a  convex  region  which  will  be  called  the  basis  region. 


Figure  11. 


As  in  the  previous  section  one  can  normalize  such  a  basis  triplet  and  arrive  to  a 
'normal'  affine  invariant  shape.  A  similar  normalization  can  be  done  for  each  pair 
of  points  on  the  scene  boundary  curve.  The  normalized  shapes  can  now  be  com- 
pared. In  principal,  this  is  the  algorithm  of  the  previous  section,  where  instead  of 
the  specified  concavities  we  use  all  the  convex  basis  regions  defined  by  pairs  of 
points.   The  algorithm  follows: 

Preprocessing 

[a]  This  step  is  the  same  as  before. 

[b]  For  each  pair  of  points  on  the  boundary  take  the  appropriate  two  regions 
defined  by  the  line  joining  these  points  and  compute  their  footprints.  Use  each 
region's  footprint  as  an  entry  to  a  hash-table  and  record  in  the  table  the  model 
and  the  basis  triplet  for  which  this  footprint  was  obtained. 

Recognition 

[a]  This  step  is  the  same  as  before. 

[b]  Choose  a  pair  of  points  in  the  scene  defining  a  convex  basis  region.  Compute 
the  footprint  of  this  region.  For  this  footprint  (properly  quantized)  check  the 
appropriate  entry  in  the  hash  table,  and  extract  the  pairs  (model,  basis  triplet) 
appearing  there.  For  each  such  relevant  model  with  the  appropriate  basis  tri- 
plet compute  the  corresponding  affine  transformation  between  the  model  and 
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the  scene,  and  verify  their  correspondence. 

The  complexity  of  the  preprocessing  step  of  the  algorithm  is  O(n^),  which  is  worse 
than  the  non-convex  one.  However,  the  recognition  step  of  the  algorithm  for  convex 
bodies  might  be  quite  efficient.  Observe  that  usually  convex  bodies  intersect  at  con- 
cave angles  (in  [6]  they  are  called  breakpoints) .  In  most  cases  it  will  be  enough  to 
examine  only  one  pair  of  points  (one  base  line)  for  each  convex  protrusion,  delim- 
ited by  two  consecutive  breakpoints  (see  Fig.  12). 


Figure  12:    Recognition  of  objects  in  Fig.  10. 

Hence,  if  a  scene  has  k  convex  boundary  subcurves  (delimited  by  consecutive 
breakpoints),  the  recognition  stage  of  the  algorithm  will  be  of  the  order  kxn  , 
where  n  is  the  number  of  object  vertices. 

For  convex  body  recognition,  we  use  a  different  footprint  than  the  one 
described  in  section  6.1.2.    Specifically,  the  footprint  is  calculated  as  follows: 

1)  Normalize  the  basis  region. 

2)  Take  the  sides  of  the  triangle,  excluding  the  base  line,  and  for  each  side 
(chord)  compute  its  distance  from  the  boundary  arc  having  this  side  as  a  base 
(see  Fig.  13).    Denote  this  numbers  by  di,  J2  • 

3)  For  each  chord  (side)  compute  the  (leftmost)  point  on  the  arc  where  the  dis- 
tance is  obtained,  and  compute  the  point  on  the  chord  closest  to  this  point.  The 
distance  of  the  above  mentioned  chord  points  to  the  intersection  of  their  sides 
with  the  base  line  will  be  denoted  ri,  r2  respectively. 

The  footprint  is  the  four  dimensional  vector  idi,ri,d2,r2). 
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Figure  13:    A  convex  shape  footprint. 

Fig.  12  shows  recognition  of  the  objects  appearing  in  the  composite  scene  of 
Fig.  10c. 

7.   Finding  the  Best  Least-Squares  Match 

In  the  previous  sections  we  have  discussed  recovery  of  an  affine  transformation 
which  is  based  on  correspondence  between  triplets  of  non-collinear  points.  Since  an 
affine  transformation  is  uniquely  defined  by  such  a  correspondence,  theoretically, 
this  is  all  we  need.  However,  in  practical  applications  the  computed  transformation 
may  be  somewhat  distorted,  due  to  noisy  image  input  at  the  relevant  points. 
Knowledge  of  additional  points  which  were  transformed  to  each  other  may  help  us 
to  improve  the  accuracy  of  the  computed  transformation.  In  this  section,  we  discuss 
computation  of  an  affine  transformation  giving  the  best  least-squares  match  between 
sets  of  corresponding  points.  Only  the  problem  and  the  final  results  are  stated  here. 
The  reader  is  referred  to  [2]  for  details  of  the  proof.  In  the  algorithm  of  point 
matching  (Section  4)  we  apply  the  least  squares  procedure,  after  an  initial  transfor- 
mation has  been  computed  by  the  basis  correspondence  procedure.  The  computed 
transformation  induces  additional  point  correspondences,  which  are  used  by  the 
least  squares  algorithm.  In  the  non-convex  curve  matching  algorithm  (Section  6.1) 
we  apply  the  least  squares  procedure  if  more  than  one  corresponding  concavity 
between  an  object  and  a  scene  has  been  discovered. 

Assume  that  we  are  looking  for  an  affine  match  between  the  sequences  of 
planar  points  (uy)"=i  and  (Vj)"=i.    We  would  like  to  find  the  affine  transformation 
Tu=Au  +  b  of  the  plane  which  will  minimize  the  l^  distance  between  the  sequences 
(Tu;);=i  and  (v,),^=i: 


8  =  min    J,\Tuj-yj\ 
^     7  =  1 
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It  can  be  shown  ([2])  that  if  we  define  the  following  four  n-dimensional  vectors 
(/  =  1.2): 

U'  =  (uj);=i 

V'  =  (v})^i 
then  the   values  of  b  and  A  were  the  minimum  is  obtained  are  given  by 

b  =  ^  S  V,  • 


and 


_  (U^-V^)(U^-U^)  -  (U^-V^)(U^-U^) 
an  ^ 

(U^-U^)(U^-V^)  -  (U^-V^)(U^-U^) 

^12-  A 


_  (U^-V^)(U^-U^)  -  (U^-U^)(U^-V^) 
A 

(U^-U^)(U^-V^)  -  (U^-V^)(U^-U^) 


022  ^ 


where 


A  =  (U^-U^)(U2-U2)  -  (Ui-U2)(Ui-U2) 


As  we  can  see  A  is  dependent  only  on  one  set  of  points  (in  this  case  the  model 
points),  so  we  can  know  in  advance,  which  sets  of  model  points  will  give  a  solution 
for  the  minima. 

In  Fig.  14c  we  see  an  example  of  a  fit  obtained  by  calculating  the  affine 
transformation  from  three  basis  points,  and  in  Fig.  14d  the  same  model  is  fitted 
using  the  best  least-squares  affine  match,  based  on  10  points,  all  of  which,  by  the 
way,  were  recovered  as  corresponding  points  by  the  transformation  in  Fig.  14c. 
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Figare  14:    a)  A  plier  rotated  and  tilted  in  space  (see  different  length  of  handles). 
b)  Extracted  interest  points,   c)  Matching  based  on  one  basis  triplet  correspondence, 
d)  Best  least  squares  affine  correspondence. 


8.    Experimental  Results 

We  have  implemented  the  point  matching,  non-convex  and  convex  curve 
matching,  and  best  least-squares  matching  algorithms. 

In  the  first  set  of  figures  we  show  recognition  of  industrial  parts  (pliers)  in 
composite  scenes.  Here  the  point  matching  algorithm  and  least  squares  matching 
algorithm  were  applied.  Fig.  1  gives  the  original  gray  scale  images  of  two  models 
(pliers)  and  their  composite  overlapping  scene,  which  was  also  significantly  tilted. 
Fig.  2  shows  the  extracted  interest  points  of  the  models  and  the  scene,  which  are 
points  of  sharp  concavities  and  convexities.  Note  that  in  the  scene  some  of  the  ori- 
ginal model  points  are  occluded.  On  the  other  hand  new  interest  points  are  gen- 
erated by  the  intersection  of  the  models.  Fig.  3  gives  the  recognition  results.  In 
Fig.  14a  we  see  an  image  of  the  pliers  of  Fig.  la  rotated,  translated  and  tilted  at 
about  40  degrees  (observe  the  different  lengths  of  both  handles  in  the  image).  The 
recognition  algorithm  was  performed  to  obtain  a  number  of  matching   basis-triplets. 
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The  corresponding  affine  transformations  were  calculated  and  for  each  such 
transformation  the  transformed  model  was  superimposed  on  the  scene  of  Fig.  14a. 
Fig.  14c  shows  such  a  transformation  computed  according  to  a  basis  triplet  which 
gives  a  somewhat  noisy  match.  Although  this  match  is  noisy,  one  can  easily  estab- 
lish additional  correspondences  between  interest  points  of  the  scene  (Fig.  14b)  and 
interest  points  of  the  transformed  original  model  (white  line  drawing  in  Fig.  14c). 
Using  these  additional  correspondences  the  least  squares  algorithm  was  applied 
resulting  in  the  match  of  Fig.  14d.  Fig.  15  shows  the  results  of  another  recognition 
experiment  based  on  the  point  matching  algorithm. 


Figure  15  :  a)  A  composite  scene  of  the  pliers  of  Fig.  1  with  an  additional  occluding  object, 
b)  Extracted  interest  points,  c)  Recognition  of  the  pliers. 


The  second  set  of  figures  deals  with  recognition  of  household  items  (for  the 
future  household  robot  !).  Fig.  4  gives  the  original  gray  scale  image  of  a  pizza 
cutter,  spatula  and  their  overlapping  scene.  The  image  of  the  scene  was  taken  by  a 
significantly  tilted  camera  resulting  in  a  distortion  of  the  models.  In  Fig.  6  the  con- 
cavity entrances  of  the  models  and  the  scene  of  Fig.  4  are  marked  by  the  dashed 
lines,  and  the  concavity  basis  triplets  are  displayed.   The  non-convex  curve  matching 
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algorithm  of  Section  6.1.1  was  applied  to  this  scene,  resulting  in  the  recognition  of 
the  pizza  cutter  and  the  spatula  displayed  in  Fig.  7.  Since  the  spatula  is  not  entirely 
flat  the  fitted  model  is  not  as  accurate  as  for  the  pizza-cutter. 

Finally,  we  have  done  experiments  to  test  our  convex  curve  matching  algo- 
rithm. Fig.  10  displays  two  models  of  convex  objects  and  Fig.  12  gives  their  recog- 
nition results.  We  took  artificial  objects  made  of  cardboard,  since  we  had  difficulty 
finding  non  polygonal  'real'  convex  objects.  We  especially  chose  objects  with  a 
'smooth'  boundary  curve  so  that  no  special  anchor  points  on  the  boundary  could  be 
identified.  Fig.  16a  shows  another  (significantly  tilted)  image  of  the  model  in  Fig. 
10a,  and  Fig.  16b  displays  its  recognition  and  the  basis  region  and  triplet  where  it 
was  achieved.  This  basis  region  was  selected  in  the  footprint  correspondence  pro- 
cedure described  in  Section  6.2. 


Figure  16  :  a)  Tilted  image  of  the  object  in  Fig.  10a. 
b)  Recognition  result  with  corresponding  basis  region. 


Fig.  17  shows  another  overlapping  scene  of  the  convex  models  of  Fig.  10,  and 
their  recognition. 
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Figure  17  :  a)  Tilted  composite  scene  of  the  objects  in  Fig.  10.   b)  Recognition. 

9.   Conclusions  and  Future  Research 

In  this  paper  we  present  methods  for  object  recognition  under  the  assumption 
of  affine  approximation  to  the  perspective  transformation.  Point,  line,  and  curve 
matching  algorithms  are  introduced.  The  algorithms  have  been  implemented  and 
successfully  tested  on  real  images. 

These  methods  naturally  extend  to  additional  cases.  The  following  issues  are 
currently  being  studied: 

1)  Recognition  of  non-flat  3-D  objects  from  2-D  images. 

2)  Implementation  of  similar  matching  procedures  based  on  synthesis  of  point  and 
line  information. 

3)  Recognition  of  objects  using  parametrized  models. 

4)  Affine  invariant  shape  representation  by  parts. 
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